Feature and Score Level Combination of Subspace Gaussians in Lvcsr Task
نویسندگان
چکیده
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.
منابع مشابه
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملDevelopment of the GALE 2008 Mandarin LVCSR system
This paper describes the current improvements of the RWTH Mandarin LVCSR system. We introduce vocal tract length normalization for the Gammatone features and present comparable results for Gammatone based feature extraction and classical feature extraction. In order to benefit from the huge amount of data of 1600h available in the GALE project we have trained the acoustic models up to 8M Gaussi...
متن کاملRapid speaker adaptation using MLLR and subspace regression classes
In recent years, various adaptation techniques for hidden Markov modeling with mixture Gaussians have been proposed, most notably MAP estimation and MLLR transformation. When the amount of adaptation data is limited, adaptation can be done by grouping similar Gaussians together to form regression classes and then transforming the Gaussians in groups. The grouping of Gaussians is often determine...
متن کاملA study on LVCSR and keyword search for tagalog
We describe a state-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) system trained on roughly 70 hours of conversational telephone speech. Using the Kaldi speech recognition toolkit, we investigate several aspects: for the acoustic front-end, we analyze the use of mel-frequency cepstral coefficients (MFCC), pitch and probability-of-voicing (PoV), and d...
متن کاملFeature-rich sub-lexical language models using a maximum entropy approach for German LVCSR
German is a morphologically rich language having a high degree of word inflections, derivations and compounding. This leads to high out-of-vocabulary (OOV) rates and poor language model (LM) probabilities in the large vocabulary continuous speech recognition (LVCSR) systems. One of the main challenges in the German LVCSR is the recognition of the OOV words. For this purpose, data-driven morphem...
متن کامل